智能论文笔记

Left Ventricle Contouring of Apical Three-Chamber Views on 2D Echocardiography

Alberto Gomez , Mihaela Porumb , Angela Mumith , Thierry Judge , Shan Gao , Woo-Jin Cho Kim , Jorge Oliveira , Agis Chartsias

分类：计算机视觉

2022-07-13

我们提出了一种新方法，可以在2D超声心动图图像上自动轮廓左心室。与大多数基于预测细分面罩的现有分割方法不同，我们重点是预测该轮廓内（基础和顶点）中的心内膜轮廓和关键地标点。这提供了一种更接近专家如何执行手动注释的表示，因此产生了在生理上更合理的结果。我们提出的方法使用基于U-NET体系结构的两头网络。一个头预测了7个轮廓点，另一个头部预测了轮廓的距离图。将这种方法与U-NET和基于点的方法进行了比较，在具有里程碑意义的定位（<4.5mm）和与地面真相轮廓（<3.5mm）的距离方面，达到30 \％的性能增长。

translated by 谷歌翻译

CRISP - Reliable Uncertainty Estimation for Medical Image Segmentation

Thierry Judge , Olivier Bernard , Mihaela Porumb , Agis Chartsias , Arian Beqiri , Pierre-Marc Jodoin

分类：计算机视觉

2022-06-15

准确的不确定性估计是医学成像社区的关键需求。已经提出了多种方法，所有直接扩展分类不确定性估计技术。独立像素的不确定性估计通常基于神经网络的概率解释，不考虑解剖学的先验知识，因此为许多细分任务提供了次优的结果。因此，我们提出了不确定性预测方法的酥脆图像分割。 Crisp以其核心实现了一种对比的方法来学习一个共同的潜在空间，该方法编码有效分割及其相应图像的分布。我们使用此联合潜在空间将预测与数千个潜在矢量进行比较，并提供解剖学上一致的不确定性图。在涉及不同方式和器官的四个医学图像数据库上进行的综合研究强调了我们方法的优势与最先进的方法相比。

translated by 谷歌翻译

FreCDo: A Large Corpus for French Cross-Domain Dialect Identification

Mihaela Gaman , Adrian-Gabriel Chifu , William Domingues , Radu Tudor Ionescu

分类：自然语言处理 | 机器学习

2022-12-15

We present a novel corpus for French dialect identification comprising 413,522 French text samples collected from public news websites in Belgium, Canada, France and Switzerland. To ensure an accurate estimation of the dialect identification performance of models, we designed the corpus to eliminate potential biases related to topic, writing style, and publication source. More precisely, the training, validation and test splits are collected from different news websites, while searching for different keywords (topics). This leads to a French cross-domain (FreCDo) dialect identification task. We conduct experiments with four competitive baselines, a fine-tuned CamemBERT model, an XGBoost based on fine-tuned CamemBERT features, a Support Vector Machines (SVM) classifier based on fine-tuned CamemBERT features, and an SVM based on word n-grams. Aside from presenting quantitative results, we also make an analysis of the most discriminative features learned by CamemBERT. Our corpus is available at https://github.com/MihaelaGaman/FreCDo.

translated by 谷歌翻译

Navigating causal deep learning

Jeroen Berrevoets , Krzysztof Kacprzyk , Zhaozhi Qian , Mihaela van der Schaar

分类：机器学习

2022-12-01

Causal deep learning (CDL) is a new and important research area in the larger field of machine learning. With CDL, researchers aim to structure and encode causal knowledge in the extremely flexible representation space of deep learning models. Doing so will lead to more informed, robust, and general predictions and inference -- which is important! However, CDL is still in its infancy. For example, it is not clear how we ought to compare different methods as they are so different in their output, the way they encode causal knowledge, or even how they represent this knowledge. This is a living paper that categorises methods in causal deep learning beyond Pearl's ladder of causation. We refine the rungs in Pearl's ladder, while also adding a separate dimension that categorises the parametric assumptions of both input and representation, arriving at the map of causal deep learning. Our map covers machine learning disciplines such as supervised learning, reinforcement learning, generative modelling and beyond. Our paradigm is a tool which helps researchers to: find benchmarks, compare methods, and most importantly: identify research gaps. With this work we aim to structure the avalanche of papers being published on causal deep learning. While papers on the topic are being published daily, our map remains fixed. We open-source our map for others to use as they see fit: perhaps to offer guidance in a related works section, or to better highlight the contribution of their paper.

translated by 谷歌翻译

Practical Approaches for Fair Learning with Multitype and Multivariate Sensitive Attributes

Tennison Liu , Alex J. Chan , Boris van Breugel , Mihaela van der Schaar

分类：机器学习 | (统计)机器学习

2022-11-11

It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences. Fair ML has largely focused on the protection of single attributes in the simpler setting where both attributes and target outcomes are binary. However, the practical application in many a real-world problem entails the simultaneous protection of multiple sensitive attributes, which are often not simply binary, but continuous or categorical. To address this more challenging task, we introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces. This leads to two practical tools: first, the FairCOCCO Score, a normalised metric that can quantify fairness in settings with single or multiple sensitive attributes of arbitrary type; and second, a subsequent regularisation term that can be incorporated into arbitrary learning objectives to obtain fair predictors. These contributions address crucial gaps in the algorithmic fairness literature, and we empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.

translated by 谷歌翻译

DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems

Nabeel Seedat , Fergus Imrie , Mihaela van der Schaar

分类：机器学习 | 人工智能 | (统计)机器学习

2022-11-09

While there have been a number of remarkable breakthroughs in machine learning (ML), much of the focus has been placed on model development. However, to truly realize the potential of machine learning in real-world settings, additional aspects must be considered across the ML pipeline. Data-centric AI is emerging as a unifying paradigm that could enable such reliable end-to-end pipelines. However, this remains a nascent area with no standardized framework to guide practitioners to the necessary data-centric considerations or to communicate the design of data-centric driven ML systems. To address this gap, we propose DC-Check, an actionable checklist-style framework to elicit data-centric considerations at different stages of the ML pipeline: Data, Training, Testing, and Deployment. This data-centric lens on development aims to promote thoughtfulness and transparency prior to system development. Additionally, we highlight specific data-centric AI challenges and research opportunities. DC-Check is aimed at both practitioners and researchers to guide day-to-day development. As such, to easily engage with and use DC-Check and associated resources, we provide a DC-Check companion website (https://www.vanderschaar-lab.com/dc-check/). The website will also serve as an updated resource as methods and tooling evolve over time.

translated by 谷歌翻译

Why neural networks find simple solutions: the many regularizers of geometric complexity

Benoit Dherin , Michael Munn , Mihaela C. Rosca , David G. T. Barrett

分类：机器学习 | (统计)机器学习

2022-09-27

在许多情况下，更简单的模型比更复杂的模型更可取，并且该模型复杂性的控制是机器学习中许多方法的目标，例如正则化，高参数调整和体系结构设计。在深度学习中，很难理解复杂性控制的潜在机制，因为许多传统措施并不适合深度神经网络。在这里，我们开发了几何复杂性的概念，该概念是使用离散的dirichlet能量计算的模型函数变异性的量度。使用理论论据和经验结果的结合，我们表明，许多常见的训练启发式方法，例如参数规范正规化，光谱规范正则化，平稳性正则化，隐式梯度正则化，噪声正则化和参数初始化的选择，都可以控制几何学复杂性，并提供一个统一的框架，以表征深度学习模型的行为。

translated by 谷歌翻译

Two Bicomplex Least Mean Square (BLMS) algorithms

Daniel Alpay , Kamal Diki , Mihaela Vajiac

分类：机器学习

2022-09-24

我们研究并介绍了复杂和双色复合物环境中的新梯度运算符，这是受自适应线性神经元（Adaline）在1960年发明的著名的最少均等（LMS）算法的启发。这些梯度运算符将用于制定最小二平方（BLM）算法的新学习规则。这种方法既扩展了经典的真实和复杂的LMS算法。

translated by 谷歌翻译

Concept Activation Regions: A Generalized Framework For Concept-Based Explanations

Jonathan Crabbé , Mihaela van der Schaar

分类：机器学习 | 人工智能

2022-09-22

基于概念的解释允许通过用户指定的概念镜头来了解深神经网络（DNN）的预测。现有方法假设说明概念的示例是在DNN潜在空间的固定方向上映射的。当这种情况下，该概念可以用指向该方向的概念激活向量（CAV）表示。在这项工作中，我们建议通过允许概念示例散布在DNN潜在空间中的不同群集中来放松这一假设。然后，每个概念都由DNN潜在空间的区域表示，该区域包括这些簇，我们称为概念激活区域（CAR）。为了使这个想法形式化，我们介绍了基于内核技巧和支持向量分类器的CAV形式主义的扩展。这种汽车形式主义产生了基于全球概念的解释和基于本地概念的特征重要性。我们证明，用径向核建造的汽车解释在潜在空间等法下是不变的。这样，汽车将相同的解释分配给具有相同几何形状的潜在空间。我们进一步证明汽车提供（1）更准确地描述了概念如何散布在DNN的潜在空间中；（2）更接近人类概念注释和（3）基于概念的特征的重要性重要性的全球解释，这些特征的重要性是有意义地相互关联的。最后，我们使用汽车表明DNN可以自主重新发现已知的科学概念，例如前列腺癌分级系统。

translated by 谷歌翻译

Adaptively Identifying Patient Populations With Treatment Benefit in Clinical Trials

Alicia Curth , Alihan Hüyük , Mihaela van der Schaar

分类： (统计)机器学习 | 机器学习

2022-08-11

我们研究了在确认临床试验期间适应从给定治疗中受益的患者亚群的问题。这种自适应临床试验通常被称为自适应富集设计，已在生物统计学中进行了彻底研究，重点是构成（子）种群的有限数量的亚组（通常为两个）和少量的临时分析点。在本文中，我们旨在放宽对此类设计的经典限制，并研究如何从有关自适应和在线实验的最新机器学习文献中纳入想法，以使试验更加灵活和高效。我们发现亚种群选择问题的独特特征 - 最重要的是，（i）通常有兴趣在预算有限的情况下找到具有任何治疗益处的亚群（不一定是最大效果的单个亚组），并且（ii）（ii）在整个亚种群中只能证明有效性 - 在设计算法解决方案时引起了有趣的挑战和新的Desiderata。在这些发现的基础上，我们提出了Adaggi和Adagcpi，这是两个用于亚群构造的元算法，分别侧重于确定良好的亚组和良好的综合亚群。我们从经验上研究了它们在一系列模拟方案中的性能，并获得了对它们在不同设置的（DIS）优势的见解。

translated by 谷歌翻译